Gastroesophageal reflux disease and non-alcoholic fatty liver disease: a two-sample Mendelian randomization combined with meta-analysis

Accumulating evidence from observational studies have suggested an association between gastroesophageal reflux disease (GERD) and non-alcoholic fatty liver disease (NAFLD). However, due to that such studies are prone to biases, we imported Mendelian randomization (MR) to explore whether the causal association between two diseases exsit. Hence, we aimed to analysis the potential association with MR. The single nucleotide polymorphisms (SNPs) of GERD were retrieved from the genome-wide association study dataset as the exposure. The SNPs of NAFLD were taken from the FinnGen dataset as the outcome. The relationship was analyzed with the assistance of inverse variance weighted, MR-Egger, and weighted median. We also uitilized the MR-Egger intercept, Cochran’s Q test, leave-one-out analysis, MR-PRESSO, and Steiger directionality test to evaluate the robustness of the causal association. The meta-analysis were also implemented to give an overall evaluation. Finally, our analysis showed a causal relationship between GERD and NAFLD with aid of MR and meta-analysis (OR 1.71 95% CI 1.40–2.09; P < 0.0001).


Data sources and selection of instrumental variables
The genome-wide association study (GWAS) dataset and the FinnGen dataset were used to perform the MR analysis.To sum up recent available data, We searched the GWAS data of gastroesophageal reflux disease (GERD) from the GWAS (https:// www.ebi.ac.uk/) and FinnGen database (https:// www.finng en.fi/ en) [20][21][22] and finally got a total of 10 groups of data for next analysis.
(Supplemental Table 1).The data of NAFLD were gained from FinnGen database which included 2275 cases and 375,002 controls, and we retrieved 20,170,233 SNPs from it.All datasets are obtained from European to avoid the potential deviation due to population stratification.
Mendelian randomization analysis requires IVs to meet three significant assumptions: (i) the IVs are strongly related to exposure (relevance); (ii) the IVs are independent of outcome (independence); (iii) the IVs have nothing to do with confounders (exclusion restriction) 18,19,23 (Fig. 1).First, we selected single nucleotide polymorphisms (SNPs) of GERD that met the standard of genome-wide significance (P < 5 × 10 -8 ).The F-statistic is calculated and F > 10 is required to ensure the strength of the relationship between IVs and phenotype (exposure) 24 .Besides, SNPs that may have linkage disequilibrium (LD) were removed (R 2 > 0.001) due to consideration of the violation for the independence of genetic variants 25,26 .In addition, we also eliminated genetic variants of palindromic and incompatible alleles in the process of harmonizing exposure and outcome.By the search of GWAS database (http:// www.pheno scann er.medsc hl.cam.ac.uk), we aimed to run out on the confounderassociated SNPs according to assumption (iii) with a threshold of P < 1 × 10 -6 .

Statistical analysis for Mendelian randomization
To analyze the possible causal effects between GERD and NAFLD, we chose the Inverse Variance Weighted (IVW) method as our leading strategy.In the presence of multiple genetic variants, the IVW can account for each genetic variant by averaging their effects to provide an overall causal estimate 16 .MR-Egger extends the IVW method by incorporating the analysis of horizontal pleiotropy through MR-Egger intercept 27 .The weighted median method can provide valid MR estimates even when up to half of the SNPs violate the InSIDE assumption 28 .The results of simple mode and weighted mode can be consistent even if the majority of instruments are invalid.The power of them is smaller than IVW and weighted median, but lager than MR-Egger 29 .Therefore, we can see that the other four methods (MR-Egger, Weighted median, simple mode, and weighted mode) can take effect in a looser scene, but give up some statistical power.Accodingly, we regarded them as supplements, but only when the directions of these five methods remain consistent could the MR analysis be rendered meaningful.
To test the robustness of the MR findings, heterogeneity and pleiotropy test are indispensable.We imported Cochran's Q test to calculate the heterogeneity of the IVW.A leave-one-out analysis was also performed to test whether the potential causal association between exposure and outcome could be seriously confounded by a single SNP.MR-PRESSO and MR-Egger were implemented to test pleiotropy.MR-Egger intercept analysis was aimed to check the average pleiotropic effect, which revealed that if there was a non-zero intercept, the IVW analysis may not hold water due to pleiotropy.Only P values of above checking methods all over 0.05 rendered the conclusion effective.Finally, in order to avoid reserve causation, Steiger directionality test was utilized to ensure the unidirectionality of MR findings 30 .
All MR analysis were two-sided and performed on the platform of Rstudio (version 4.1.3)with the TwoSam-pleMR package (version 0.5.8).Full documentation was able to be retrieved at https:// mrcieu.github.io/ TwoSa mpleMR.The study design is shown in Fig. 1.

Statistical analysis for meta-analysis
For the three groups derived from MR analysis, we then performed meta-analysis to evaluate summarized odds ratio (OR) and 95% confidence intervals (CI).If the heterogeneity existed, the OR would be calculated via random-effects models, otherwise fixed-effects models.We opted to use I-square tests to test heterogeneity.Only when I 2 -value < 50% and P > 0.05 could the result of meta-analysis be deemed as homogeneous.
The meta-analysis was performed on the platform of Rstudio (version 4.1.3)with the meta package.

Ethical approval
No additional ethics approval was needed because all data in the study was previously collected, analyzed, and published.

Result Instrumental variables for MR
After screening for genome-wide significance (P-value), LD (R 2 < 0.001), and the evaluation of F (F > 10), only three groups of data of exposure was reserved for MR and meta-analysis (Table 1).The other seven sets were abandoned because the value of F does not satisfy the requirement.Moreover, to include more SNPs that contributed to GERD, a relatively relaxed threshold were applied with P < 5 × 10 -7 for the Group 2 (GWAS ID: ebia-GCST90018848) and P < 5 × 10 -6 for the Group 3 (GWAS ID: finn-b-K11_REFLUX), which had previously utilized in many MR researches 31 .We also removed SNPs that were associated with confounders of NAFLD according to literatures (obesity, hypertension, type II diabetes mellitus, dietary habits, physical activity, and www.nature.com/scientificreports/socioeconomic factors) 8 .Lastly, suitable IVs were generated (Group 1: 22 SNPs; Group 2: 7 SNPs; Group 3: 29 SNPs) for the next MR analysis (Supplemental Table 2).

Mendelian randomization analysis
The IVW analysis displayed GERD was associated with a higher morbidity of NAFLD (Group 1: OR = 1.81; 95%CI Leave-one-out cross-validation was utilized to calculate the MR result of the left IVs after removing them one by one.The β value remained above zero regardless of which genetic variants were removed, suggesting that each variant performed a positive influence on the outcome.With the assistance of the Steiger test, the direction of our analysis confirmed that there was no reverse causality (all P < 0.001) (Table 2, Figs. 2, 3 and Supplemental Fig. 1, 2, 3).
Of note, the issue of potential overlapping between databases (Group 2 and Group 3) is worth explaining.The exposure and outcome of Group 2 20 may both have partial samples from FinnGen, while for Group 3 they both from FinnGen.The influence of potential overlapping (called weak instrument bias) can be measured by Type 1 error rates.We used the formulae of website built by Burgess et al. 32 (https:// sb452.shiny apps.io/ overl ap) to calculate the probability of making Type 1 error and the results showed that the rates of the two groups (Group 1, Group 2) were both approximately 5%, which could be acceptable and indirectly reflected a lower degree of overlap.

Meta-analysis
Taking IVW as the major method, we conducted meta-analysis on it for an overall outcome.The result did not show any statistical heterogeneity (I 2 = 0.0%; H = 1.00;P = 0.81).Accordingly, the fixed-effects model were chosen to perform the result (OR 1.71 95% CI 1.40-2.09;P < 0.0001) (Fig. 4 and Supplemental Table 7).Meanwhile, the Egger's test (P = 0.26), Begg's test (P = 0.33) and the visualization of funnel plot showed no evidence of publication bias (Supplemental Fig. 4).

Discussion
To our knowledge, this study is the first MR analysis to reveal the causation between GERD and NAFLD.We found the overall risk of getting NAFLD is about 71% in people with GERD (OR 1.71 95% CI 1.40-2.09;P < 0.0001) in MR and meta analysis, by virtue of the GWAS and FinnGen databases 10 .This research is also consistent with Wijarnpreecha and Lee 13,33 .
The link between GERD and NAFLD has not been entirely understood.Several possible explanations may account for it.
There is evidence that people suffering from GERD are inclined to acquire insulin resistance (IR) 34 .The reflux of acid stimulates esophageal epithelial cells to produce chemokines, which subsequently activate various immune cells.Then these activated cells will produce myriad pro-inflammatory cytokines such as IL-6, IL-1, and TNF-alpha.On the one hand, these cytokines give rise to insulin resistance, making insulin lose its capability of inhibiting apoptosis.Utimately, NAFLD develops as a result of the accumulation of free fatty acids in the liver 35,36 .On the other hand, Cytokines tend to recruit Kupffer cells to intervene in inflammation and suppress lipid metabolism, ultimately leading to NAFLD 37 .
Reactive oxygen species (ROS) have been reported to play an important role in the pathogenesis of varied gastrointestinal (GI) diseases, including GERD and liver cirrhosis 38,39 .Acid exposure and sequential ROS generation will cause the damage to mucosal damage 35 .Simultaneously, ROS can be detrimental to the liver and accelerate the pathogenesis of NAFLD 40 .
There is evidence that visceral obesity, but not BMI, is closely correlated with the development of GERD and NAFLD 11,[41][42][43] .The visceral adipose tissue (VAT) accumulated in the abdomen leads to the increase of intragastric pressure, which then contributes to abnormal acid reflux, and inescapably GERD 44,45 .Besides, the accumulation of free fatty acids in the liver and adipocytokine dysregulation induced by VAT will foster the chronic inflammation and promote the pathogenesis of NAFLD 11,46 .
Additionally, GERD patients tend to take PPIs, serving as the most effective treatment.In recent years, copious studies have reported that the long-term use of PPIs may induce the intestinal dysbiosis, affecting the composition of the gut microbiota and pushing the bacterial overgrowth 47 .On that occasion, the bacterial transit from luminal surface to the liver circulation is uplifted, thus causing inflammatory response in the liver and deteriorating its condition.At last, as mentioned above, inflammatory cytokines will produce NAFLD 48 .

Limitation
However, some limitations still exist in the study.First, due to the lack of data on other races, the genetic variants of exposure and outcome were both retrieved from European ancestry.Thus, expanding the causal relationship discovered to other races was not appropriate, which suggested the lack of universality.However, the limited population frees the discovery from the bias of the different races of exposure and outcome.Second, although MR analysis avoids the drawbacks of observational studies, such as reverse causality and residual confounding, it is susceptible to pleiotropy.The use of the weighted median can solve this problem, because it works even if half of the SNPs are not valid 22,23 .Moreover, the MR-Egger intercept test is also joined to minimize the bias from pleiotropy.Third, despite that we removed the SNPs associated with confounders according to assumption (iii), this type of selection was based on the present GWAS database.We acknowledge that the analysis cannot control for unknown confounders, which may violate assumption (iii) and bias the analysis.Fourth, the study has only demonstrated that people with GERD are more likely to develop NAFLD, based on the existing databases.The specific mechanisms between the two diseases are warranting further examination.Finally, we used a dataset of currently searchable large samples for analysis.As the database increase and update, the result may change accordingly.

Conclusion
In conclusion, our MR study combined with meta-analysis indicated the credible causal association between GERD and NAFLD.Considering the high prevalence of both diseases, further investigations may contribute to the development of new prevention and treatment strategies.

Figure 1 .
Figure 1.Graphical dipiction of the study design and MR assumptions in a two-sample MR design: (i) relevance; (ii) exclusiveness; (iii) independence.

Figure 2 .
Figure 2. The MR results of the causal effect of GERD on NAFLD (with Forest plot).

Figure 3 .
Figure 3. Scatter plots and Funnel plots of genetic associations with GERD against the genetic associations with NAFLD.(A) and (B) The scatter plot and funnel plot of the association between GERD (id: ebi-a-GCST90000514) and NAFLD.(C) and (D) The scatter plot and funnel plot of the association between GERD (id: ebi-a-GCST90018848) and NAFLD.(E) and (F) The scatter plot of the association between GERD (id: finnb-K11_REFLUX) and NAFLD.

Table 1 .
Description of the data sources of exposure.

Table 2 .
The heterogeneity and pleiotropy test of the MR analysis.